Search CORE

330 research outputs found

Learning Conventions in Multiagent Stochastic Domains using Likelihood Estimates

Author: Boutilier Craig
Publication venue
Publication date: 13/02/2013
Field of study

Fully cooperative multiagent systems - those in which agents share a joint utility model- is of special interest in AI. A key problem is that of ensuring that the actions of individual agents are coordinated, especially in settings where the agents are autonomous decision makers. We investigate approaches to learning coordinated strategies in stochastic domains where an agent's actions are not directly observable by others. Much recent work in game theory has adopted a Bayesian learning perspective to the more general problem of equilibrium selection, but tends to assume that actions can be observed. We discuss the special problems that arise when actions are not observable, including effects on rates of convergence, and the effect of action failure probabilities and asymmetries. We also use likelihood estimates as a means of generalizing fictitious play learning models in our setting. Finally, we propose the use of maximum likelihood as a means of removing strategies from consideration, with the aim of convergence to a conventional equilibrium, at which point learning and deliberation can cease.Comment: Appears in Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence (UAI1996

arXiv.org e-Print Archive

Modal Logics for Qualitative Possibility and Beliefs

Author: Boutilier Craig
Publication venue
Publication date: 13/03/2013
Field of study

Possibilistic logic has been proposed as a numerical formalism for reasoning with uncertainty. There has been interest in developing qualitative accounts of possibility, as well as an explanation of the relationship between possibility and modal logics. We present two modal logics that can be used to represent and reason with qualitative statements of possibility and necessity. Within this modal framework, we are able to identify interesting relationships between possibilistic logic, beliefs and conditionals. In particular, the most natural conditional definable via possibilistic means for default reasoning is identical to Pearl's conditional for e-semantics.Comment: Appears in Proceedings of the Eighth Conference on Uncertainty in Artificial Intelligence (UAI1992

arXiv.org e-Print Archive

The Probability of a Possibility: Adding Uncertainty to Default Rules

Author: Boutilier Craig
Publication venue
Publication date: 06/03/2013
Field of study

We present a semantics for adding uncertainty to conditional logics for default reasoning and belief revision. We are able to treat conditional sentences as statements of conditional probability, and express rules for revision such as "If A were believed, then B would be believed to degree p." This method of revision extends conditionalization by allowing meaningful revision by sentences whose probability is zero. This is achieved through the use of counterfactual probabilities. Thus, our system accounts for the best properties of qualitative methods of update (in particular, the AGM theory of revision) and probabilistic methods. We also show how our system can be viewed as a unification of probability theory and possibility theory, highlighting their orthogonality and providing a means for expressing the probability of a possibility. We also demonstrate the connection to Lewis's method of imaging.Comment: Appears in Proceedings of the Ninth Conference on Uncertainty in Artificial Intelligence (UAI1993

arXiv.org e-Print Archive

Eliciting Forecasts from Self-interested Experts: Scoring Rules for Decision Makers

Author: Boutilier Craig
Publication venue
Publication date: 13/06/2011
Field of study

Scoring rules for eliciting expert predictions of random variables are usually developed assuming that experts derive utility only from the quality of their predictions (e.g., score awarded by the rule, or payoff in a prediction market). We study a more realistic setting in which (a) the principal is a decision maker and will take a decision based on the expert's prediction; and (b) the expert has an inherent interest in the decision. For example, in a corporate decision market, the expert may derive different levels of utility from the actions taken by her manager. As a consequence the expert will usually have an incentive to misreport her forecast to influence the choice of the decision maker if typical scoring rules are used. We develop a general model for this setting and introduce the concept of a compensation rule. When combined with the expert's inherent utility for decisions, a compensation rule induces a net scoring rule that behaves like a normal scoring rule. Assuming full knowledge of expert utility, we provide a complete characterization of all (strictly) proper compensation rules. We then analyze the situation where the expert's utility function is not fully known to the decision maker. We show bounds on: (a) expert incentive to misreport; (b) the degree to which an expert will misreport; and (c) decision maker loss in utility due to such uncertainty. These bounds depend in natural ways on the degree of uncertainty, the local degree of convexity of net scoring function, and natural properties of the decision maker's utility function. They also suggest optimization procedures for the design of compensation rules. Finally, we briefly discuss the use of compensation rules as market scoring rules for self-interested experts in a prediction market.Comment: 11 pages 4 figures pdflatex See http://www.cs.toronto.edu/~cebly/papers.htm

arXiv.org e-Print Archive

Value-Directed Belief State Approximation for POMDPs

Author: Boutilier Craig
Poupart Pascal
Publication venue
Publication date: 16/01/2013
Field of study

We consider the problem belief-state monitoring for the purposes of implementing a policy for a partially-observable Markov decision process (POMDP), specifically how one might approximate the belief state. Other schemes for belief-state approximation (e.g., based on minimixing a measures such as KL-diveregence between the true and estimated state) are not necessarily appropriate for POMDPs. Instead we propose a framework for analyzing value-directed approximation schemes, where approximation quality is determined by the expected error in utility rather than by the error in the belief state itself. We propose heuristic methods for finding good projection schemes for belief state estimation - exhibiting anytime characteristics - given a POMDP value fucntion. We also describe several algorithms for constructing bounds on the error in decision quality (expected utility) associated with acting in accordance with a given belief state approximation.Comment: Appears in Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI2000

arXiv.org e-Print Archive

Practical Linear Value-approximation Techniques for First-order MDPs

Author: Boutilier Craig
Sanner Scott
Publication venue
Publication date: 27/06/2012
Field of study

Recent work on approximate linear programming (ALP) techniques for first-order Markov Decision Processes (FOMDPs) represents the value function linearly w.r.t. a set of first-order basis functions and uses linear programming techniques to determine suitable weights. This approach offers the advantage that it does not require simplification of the first-order value function, and allows one to solve FOMDPs independent of a specific domain instantiation. In this paper, we address several questions to enhance the applicability of this work: (1) Can we extend the first-order ALP framework to approximate policy iteration to address performance deficiencies of previous approaches? (2) Can we automatically generate basis functions and evaluate their impact on value function quality? (3) How can we decompose intractable problems with universally quantified rewards into tractable subproblems? We propose answers to these questions along with a number of novel optimizations and provide a comparative empirical evaluation on logistics problems from the ICAPS 2004 Probabilistic Planning Competition.Comment: Appears in Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence (UAI2006

arXiv.org e-Print Archive

Approximate Linear Programming for First-order MDPs

Author: Boutilier Craig
Sanner Scott
Publication venue
Publication date: 04/07/2012
Field of study

We introduce a new approximate solution technique for first-order Markov decision processes (FOMDPs). Representing the value function linearly w.r.t. a set of first-order basis functions, we compute suitable weights by casting the corresponding optimization as a first-order linear program and show how off-the-shelf theorem prover and LP software can be effectively used. This technique allows one to solve FOMDPs independent of a specific domain instantiation; furthermore, it allows one to determine bounds on approximation error that apply equally to all domain instantiations. We apply this solution technique to the task of elevator scheduling with a rich feature space and multi-criteria additive reward, and demonstrate that it outperforms a number of intuitive, heuristicallyguided policies.Comment: Appears in Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (UAI2005

arXiv.org e-Print Archive

Vector-space Analysis of Belief-state Approximation for POMDPs

Author: Boutilier Craig
Poupart Pascal
Publication venue
Publication date: 10/01/2013
Field of study

We propose a new approach to value-directed belief state approximation for POMDPs. The value-directed model allows one to choose approximation methods for belief state monitoring that have a small impact on decision quality. Using a vector space analysis of the problem, we devise two new search procedures for selecting an approximation scheme that have much better computational properties than existing methods. Though these provide looser error bounds, we show empirically that they have a similar impact on decision quality in practice, and run up to two orders of magnitude more quickly.Comment: Appears in Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI2001

arXiv.org e-Print Archive

Regret Minimizing Equilibria and Mechanisms for Games with Strict Type Uncertainty

Author: Boutilier Craig
Hyafil Nathanael
Publication venue
Publication date: 11/07/2012
Field of study

Mechanism design has found considerable application to the construction of agent-interaction protocols. In the standard setting, the type (e.g., utility function) of an agent is not known by other agents, nor is it known by the mechanism designer. When this uncertainty is quantified probabilistically, a mechanism induces a game of incomplete information among the agents. However, in many settings, uncertainty over utility functions cannot easily be quantified. We consider the problem of incomplete information games in which type uncertainty is strict or unquantified. We propose the use of minimax regret as a decision criterion in such games, a robust approach for dealing with type uncertainty. We define minimax-regret equilibria and prove that these exist in mixed strategies for finite games. We also consider the problem of mechanism design in this framework by adopting minimax regret as an optimization criterion for the designer itself, and study automated optimization of such mechanisms.Comment: Appears in Proceedings of the Twentieth Conference on Uncertainty in Artificial Intelligence (UAI2004

arXiv.org e-Print Archive

Regret-based Reward Elicitation for Markov Decision Processes

Author: Boutilier Craig
Regan Kevin
Publication venue
Publication date: 09/05/2012
Field of study

The specification of aMarkov decision process (MDP) can be difficult. Reward function specification is especially problematic; in practice, it is often cognitively complex and time-consuming for users to precisely specify rewards. This work casts the problem of specifying rewards as one of preference elicitation and aims to minimize the degree of precision with which a reward function must be specified while still allowing optimal or near-optimal policies to be produced. We first discuss how robust policies can be computed for MDPs given only partial reward information using the minimax regret criterion. We then demonstrate how regret can be reduced by efficiently eliciting reward information using bound queries, using regret-reduction as a means for choosing suitable queries. Empirical results demonstrate that regret-based reward elicitation offers an effective way to produce near-optimal policies without resorting to the precise specification of the entire reward function.Comment: Appears in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI2009

arXiv.org e-Print Archive